23 research outputs found

    Adaptive vocabularies for transcribing multilingual broadcast news

    Get PDF

    Introducing Linguistic Constraints into Statistical Language Modeling

    No full text
    Building robust stochastic language models is a major issue in speech recognition systems. Conventional word-based n-gram models do not capture any linguistic constraints inherent in speech. In this paper the notion of function and contentwords #open#closed word classes# is used to provide linguistic knowledge that can be incorporated into language models. Function words are articles, prepositions, personal pronouns # contentwords are nouns, verbs, adjectives and adverbs. Based on this class de#nition resulting in function and contentword markers, a new language model is de#ned. A combination of the word-based model with this new model will be introduced. The combined model shows modest improvements both in perplexity results and recognition performance. 1
    corecore